|
In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable ''y'' and one or more explanatory variables (or independent variables) denoted ''X''. The case of one explanatory variable is called ''simple linear regression''. For more than one explanatory variable, the process is called ''multiple linear regression''. (This term should be distinguished from ''multivariate linear regression'', where multiple correlated dependent variables are predicted, rather than a single scalar variable.)〔.〕 In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called ''linear models''. Most commonly, the conditional mean of ''y'' given the value of ''X'' is assumed to be an affine function of ''X''; less commonly, the median or some other quantile of the conditional distribution of ''y'' given ''X'' is expressed as a linear function of ''X''. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of ''y'' given ''X'', rather than on the joint probability distribution of ''y'' and ''X'', which is the domain of multivariate analysis. Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine. Linear regression has many practical uses. Most applications fall into one of the following two broad categories: * If the goal is prediction, or forecasting, or error reduction, linear regression can be used to fit a predictive model to an observed data set of ''y'' and ''X'' values. After developing such a model, if an additional value of ''X'' is then given without its accompanying value of ''y'', the fitted model can be used to make a prediction of the value of ''y''. * Given a variable ''y'' and a number of variables ''X''1, ..., ''X''''p'' that may be related to ''y'', linear regression analysis can be applied to quantify the strength of the relationship between ''y'' and the ''X''''j'', to assess which ''X''''j'' may have no relationship with ''y'' at all, and to identify which subsets of the ''X''''j'' contain redundant information about ''y''. Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares loss function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous. ==Introduction to linear regression== Given a data set of ''n'' statistical units, a linear regression model assumes that the relationship between the dependent variable ''yi'' and the ''p''-vector of regressors ''xi'' is linear. This relationship is modeled through a ''disturbance term'' or ''error variable'' ''εi'' — an unobserved random variable that adds noise to the linear relationship between the dependent variable and regressors. Thus the model takes the form : where T denotes the transpose, so that ''xi''T''β'' is the inner product between vectors ''xi'' and ''β''. Often these ''n'' equations are stacked together and written in vector form as : where : : : Some remarks on terminology and general use: * is called the ''regressand'', ''endogenous variable'', ''response variable'', ''measured variable'', ''criterion variable'', or ''dependent variable'' (see dependent and independent variables.) The decision as to which variable in a data set is modeled as the dependent variable and which are modeled as the independent variables may be based on a presumption that the value of one of the variables is caused by, or directly influenced by the other variables. Alternatively, there may be an operational reason to model one of the variables in terms of the others, in which case there need be no presumption of causality. * are called ''regressors'', ''exogenous variables'', ''explanatory variables'', ''covariates'', ''input variables'', ''predictor variables'', or ''independent variables'' (see dependent and independent variables, but not to be confused with independent random variables). The matrix is sometimes called the design matrix. * * Usually a constant is included as one of the regressors. For example we can take ''x''''i''1 = 1 for ''i'' = 1, ..., ''n''. The corresponding element of ''β'' is called the ''intercept''. Many statistical inference procedures for linear models require an intercept to be present, so it is often included even if theoretical considerations suggest that its value should be zero. * * Sometimes one of the regressors can be a non-linear function of another regressor or of the data, as in polynomial regression and segmented regression. The model remains linear as long as it is linear in the parameter vector ''β''. * * The regressors ''x''''ij'' may be viewed either as random variables, which we simply observe, or they can be considered as predetermined fixed values which we can choose. Both interpretations may be appropriate in different cases, and they generally lead to the same estimation procedures; however different approaches to asymptotic analysis are used in these two situations. * is a ''p''-dimensional ''parameter vector''. Its elements are also called ''effects'', or ''regression coefficients''. Statistical estimation and inference in linear regression focuses on ''β''. The elements of this parameter vector are interpreted as the partial derivatives of the dependent variable with respect to the various independent variables. * is called the ''error term'', ''disturbance term'', or ''noise''. This variable captures all other factors which influence the dependent variable ''y''''i'' other than the regressors ''x''''i''. The relationship between the error term and the regressors, for example whether they are correlated, is a crucial step in formulating a linear regression model, as it will determine the method to use for estimation. Example. Consider a situation where a small ball is being tossed up in the air and then we measure its heights of ascent ''hi'' at various moments in time ''ti''. Physics tells us that, ignoring the drag, the relationship can be modeled as : where ''β''1 determines the initial velocity of the ball, ''β''2 is proportional to the standard gravity, and ''ε''''i'' is due to measurement errors. Linear regression can be used to estimate the values of ''β''1 and ''β''2 from the measured data. This model is non-linear in the time variable, but it is linear in the parameters ''β''1 and ''β''2; if we take regressors ''x''''i'' = (''x''''i''1, ''x''''i''2) = (''t''''i'', ''t''''i''2), the model takes on the standard form : 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Linear regression」の詳細全文を読む スポンサード リンク
|